Overview

Dataset statistics

Number of variables10
Number of observations5680
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory443.9 KiB
Average record size in memory80.0 B

Variable types

Numeric10

Alerts

gross_revenue is highly correlated with purchases_no and 2 other fieldsHigh correlation
recency_days is highly correlated with purchases_noHigh correlation
purchases_no is highly correlated with gross_revenue and 6 other fieldsHigh correlation
products_no is highly correlated with gross_revenue and 2 other fieldsHigh correlation
items_no is highly correlated with gross_revenue and 3 other fieldsHigh correlation
frequency is highly correlated with purchases_no and 1 other fieldsHigh correlation
returns_no is highly correlated with purchases_no and 1 other fieldsHigh correlation
satisfaction_rate is highly correlated with purchases_no and 1 other fieldsHigh correlation
gross_revenue is highly correlated with purchases_no and 1 other fieldsHigh correlation
purchases_no is highly correlated with gross_revenue and 2 other fieldsHigh correlation
products_no is highly correlated with purchases_noHigh correlation
items_no is highly correlated with gross_revenue and 1 other fieldsHigh correlation
gross_revenue is highly correlated with purchases_no and 2 other fieldsHigh correlation
purchases_no is highly correlated with gross_revenue and 2 other fieldsHigh correlation
products_no is highly correlated with gross_revenue and 1 other fieldsHigh correlation
items_no is highly correlated with gross_revenue and 2 other fieldsHigh correlation
frequency is highly correlated with purchases_noHigh correlation
returns_no is highly correlated with satisfaction_rateHigh correlation
satisfaction_rate is highly correlated with returns_noHigh correlation
df_index is highly correlated with customer_id and 1 other fieldsHigh correlation
customer_id is highly correlated with df_index and 1 other fieldsHigh correlation
gross_revenue is highly correlated with purchases_no and 3 other fieldsHigh correlation
recency_days is highly correlated with df_index and 1 other fieldsHigh correlation
purchases_no is highly correlated with gross_revenue and 3 other fieldsHigh correlation
products_no is highly correlated with gross_revenue and 3 other fieldsHigh correlation
items_no is highly correlated with gross_revenue and 3 other fieldsHigh correlation
returns_no is highly correlated with gross_revenue and 3 other fieldsHigh correlation
gross_revenue is highly skewed (γ1 = 22.98688033) Skewed
items_no is highly skewed (γ1 = 25.07233183) Skewed
returns_no is highly skewed (γ1 = 30.93860988) Skewed
df_index is uniformly distributed Uniform
df_index has unique values Unique
customer_id has unique values Unique
returns_no has 4191 (73.8%) zeros Zeros

Reproduction

Analysis started2022-08-30 14:27:16.689574
Analysis finished2022-08-30 14:27:50.745673
Duration34.06 seconds
Software versionpandas-profiling v3.2.0
Download configurationconfig.json

Variables

df_index
Real number (ℝ≥0)

HIGH CORRELATION
UNIFORM
UNIQUE

Distinct5680
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2888.28662
Minimum0
Maximum5770
Zeros1
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size44.5 KiB
2022-08-30T11:27:51.103843image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile288.95
Q11450.75
median2890.5
Q34329.25
95-th percentile5480.05
Maximum5770
Range5770
Interquartile range (IQR)2878.5

Descriptive statistics

Standard deviation1664.582105
Coefficient of variation (CV)0.576321648
Kurtosis-1.196208613
Mean2888.28662
Median Absolute Deviation (MAD)1439.5
Skewness-0.003557210241
Sum16405468
Variance2770833.583
MonotonicityStrictly increasing
2022-08-30T11:27:51.674140image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
01
 
< 0.1%
38791
 
< 0.1%
38551
 
< 0.1%
38541
 
< 0.1%
38531
 
< 0.1%
38521
 
< 0.1%
38511
 
< 0.1%
38501
 
< 0.1%
38491
 
< 0.1%
38481
 
< 0.1%
Other values (5670)5670
99.8%
ValueCountFrequency (%)
01
< 0.1%
11
< 0.1%
21
< 0.1%
31
< 0.1%
41
< 0.1%
51
< 0.1%
61
< 0.1%
71
< 0.1%
81
< 0.1%
91
< 0.1%
ValueCountFrequency (%)
57701
< 0.1%
57691
< 0.1%
57681
< 0.1%
57671
< 0.1%
57661
< 0.1%
57651
< 0.1%
57641
< 0.1%
57631
< 0.1%
57621
< 0.1%
57611
< 0.1%

customer_id
Real number (ℝ≥0)

HIGH CORRELATION
UNIQUE

Distinct5680
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean16605.09894
Minimum12347
Maximum22709
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size44.5 KiB
2022-08-30T11:27:52.056643image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum12347
5-th percentile12702.95
Q114291.75
median16231
Q318212.25
95-th percentile21743.2
Maximum22709
Range10362
Interquartile range (IQR)3920.5

Descriptive statistics

Standard deviation2808.520003
Coefficient of variation (CV)0.1691359993
Kurtosis-0.8231342656
Mean16605.09894
Median Absolute Deviation (MAD)1960
Skewness0.4403964428
Sum94316962
Variance7887784.608
MonotonicityNot monotonic
2022-08-30T11:27:52.452448image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
178501
 
< 0.1%
150071
 
< 0.1%
210871
 
< 0.1%
210861
 
< 0.1%
155781
 
< 0.1%
124241
 
< 0.1%
210841
 
< 0.1%
178371
 
< 0.1%
210811
 
< 0.1%
143271
 
< 0.1%
Other values (5670)5670
99.8%
ValueCountFrequency (%)
123471
< 0.1%
123481
< 0.1%
123491
< 0.1%
123501
< 0.1%
123521
< 0.1%
123531
< 0.1%
123541
< 0.1%
123551
< 0.1%
123561
< 0.1%
123571
< 0.1%
ValueCountFrequency (%)
227091
< 0.1%
227081
< 0.1%
227071
< 0.1%
227061
< 0.1%
227051
< 0.1%
227041
< 0.1%
227001
< 0.1%
226991
< 0.1%
226961
< 0.1%
226951
< 0.1%

gross_revenue
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
SKEWED

Distinct5438
Distinct (%)95.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1764.006007
Minimum0.42
Maximum279138.02
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size44.5 KiB
2022-08-30T11:27:52.812924image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0.42
5-th percentile13.3735
Q1237.5475
median615.265
Q31571.07
95-th percentile5307.991
Maximum279138.02
Range279137.6
Interquartile range (IQR)1333.5225

Descriptive statistics

Standard deviation7525.596757
Coefficient of variation (CV)4.266196786
Kurtosis696.4651208
Mean1764.006007
Median Absolute Deviation (MAD)480.515
Skewness22.98688033
Sum10019554.12
Variance56634606.55
MonotonicityNot monotonic
2022-08-30T11:27:53.159735image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
7.959
 
0.2%
2.958
 
0.1%
1.258
 
0.1%
4.958
 
0.1%
1.657
 
0.1%
12.757
 
0.1%
3.757
 
0.1%
7.56
 
0.1%
5.956
 
0.1%
4.256
 
0.1%
Other values (5428)5608
98.7%
ValueCountFrequency (%)
0.421
 
< 0.1%
0.651
 
< 0.1%
0.791
 
< 0.1%
0.844
0.1%
0.853
 
0.1%
1.071
 
< 0.1%
1.258
0.1%
1.441
 
< 0.1%
1.657
0.1%
1.691
 
< 0.1%
ValueCountFrequency (%)
279138.021
< 0.1%
259657.31
< 0.1%
194550.791
< 0.1%
140450.721
< 0.1%
124564.531
< 0.1%
117379.631
< 0.1%
91062.381
< 0.1%
72882.091
< 0.1%
66653.561
< 0.1%
65039.621
< 0.1%

recency_days
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION

Distinct304
Distinct (%)5.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean116.8264085
Minimum0
Maximum373
Zeros37
Zeros (%)0.7%
Negative0
Negative (%)0.0%
Memory size44.5 KiB
2022-08-30T11:27:53.532960image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile3
Q122
median71
Q3199.25
95-th percentile338
Maximum373
Range373
Interquartile range (IQR)177.25

Descriptive statistics

Standard deviation111.6124711
Coefficient of variation (CV)0.9553702158
Kurtosis-0.640424192
Mean116.8264085
Median Absolute Deviation (MAD)61
Skewness0.8152565497
Sum663574
Variance12457.34369
MonotonicityNot monotonic
2022-08-30T11:27:53.918265image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1110
 
1.9%
4105
 
1.8%
398
 
1.7%
292
 
1.6%
1086
 
1.5%
882
 
1.4%
1779
 
1.4%
979
 
1.4%
777
 
1.4%
1566
 
1.2%
Other values (294)4806
84.6%
ValueCountFrequency (%)
037
 
0.7%
1110
1.9%
292
1.6%
398
1.7%
4105
1.8%
552
0.9%
777
1.4%
882
1.4%
979
1.4%
1086
1.5%
ValueCountFrequency (%)
37323
0.4%
37222
0.4%
37117
0.3%
3694
 
0.1%
36813
0.2%
36716
0.3%
36615
0.3%
36519
0.3%
36411
0.2%
3627
 
0.1%

purchases_no
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct56
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.477464789
Minimum1
Maximum206
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size44.5 KiB
2022-08-30T11:27:54.303833image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q34
95-th percentile11
Maximum206
Range205
Interquartile range (IQR)3

Descriptive statistics

Standard deviation6.821203018
Coefficient of variation (CV)1.961544813
Kurtosis301.4334068
Mean3.477464789
Median Absolute Deviation (MAD)0
Skewness13.1790814
Sum19752
Variance46.52881062
MonotonicityNot monotonic
2022-08-30T11:27:54.668979image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
12858
50.3%
2823
 
14.5%
3503
 
8.9%
4394
 
6.9%
5237
 
4.2%
6173
 
3.0%
7138
 
2.4%
898
 
1.7%
969
 
1.2%
1055
 
1.0%
Other values (46)332
 
5.8%
ValueCountFrequency (%)
12858
50.3%
2823
 
14.5%
3503
 
8.9%
4394
 
6.9%
5237
 
4.2%
6173
 
3.0%
7138
 
2.4%
898
 
1.7%
969
 
1.2%
1055
 
1.0%
ValueCountFrequency (%)
2061
< 0.1%
1991
< 0.1%
1241
< 0.1%
971
< 0.1%
912
< 0.1%
861
< 0.1%
721
< 0.1%
622
< 0.1%
601
< 0.1%
571
< 0.1%

products_no
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct529
Distinct (%)9.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean92.81619718
Minimum1
Maximum7838
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size44.5 KiB
2022-08-30T11:27:55.032768image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2
Q114
median41
Q3107
95-th percentile333
Maximum7838
Range7837
Interquartile range (IQR)93

Descriptive statistics

Standard deviation210.8144046
Coefficient of variation (CV)2.271310514
Kurtosis509.2830005
Mean92.81619718
Median Absolute Deviation (MAD)33
Skewness17.73785676
Sum527196
Variance44442.71317
MonotonicityNot monotonic
2022-08-30T11:27:55.417584image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1253
 
4.5%
2148
 
2.6%
3107
 
1.9%
10100
 
1.8%
698
 
1.7%
992
 
1.6%
590
 
1.6%
487
 
1.5%
782
 
1.4%
881
 
1.4%
Other values (519)4542
80.0%
ValueCountFrequency (%)
1253
4.5%
2148
2.6%
3107
1.9%
487
 
1.5%
590
 
1.6%
698
 
1.7%
782
 
1.4%
881
 
1.4%
992
 
1.6%
10100
 
1.8%
ValueCountFrequency (%)
78381
< 0.1%
56731
< 0.1%
50951
< 0.1%
45801
< 0.1%
26981
< 0.1%
23791
< 0.1%
20601
< 0.1%
18181
< 0.1%
16731
< 0.1%
16371
< 0.1%

items_no
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
SKEWED

Distinct1838
Distinct (%)32.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean953.1933099
Minimum1
Maximum196844
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size44.5 KiB
2022-08-30T11:27:55.741360image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile4.95
Q1106
median317.5
Q3805.25
95-th percentile2927.8
Maximum196844
Range196843
Interquartile range (IQR)699.25

Descriptive statistics

Standard deviation4194.544266
Coefficient of variation (CV)4.400517946
Kurtosis940.45997
Mean953.1933099
Median Absolute Deviation (MAD)253.5
Skewness25.07233183
Sum5414138
Variance17594201.6
MonotonicityNot monotonic
2022-08-30T11:27:56.053121image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1113
 
2.0%
271
 
1.2%
351
 
0.9%
449
 
0.9%
535
 
0.6%
629
 
0.5%
1224
 
0.4%
8821
 
0.4%
7221
 
0.4%
720
 
0.4%
Other values (1828)5246
92.4%
ValueCountFrequency (%)
1113
2.0%
271
1.2%
351
0.9%
449
0.9%
535
 
0.6%
629
 
0.5%
720
 
0.4%
818
 
0.3%
97
 
0.1%
1017
 
0.3%
ValueCountFrequency (%)
1968441
< 0.1%
802631
< 0.1%
773731
< 0.1%
699931
< 0.1%
645491
< 0.1%
641241
< 0.1%
633121
< 0.1%
583431
< 0.1%
578851
< 0.1%
502551
< 0.1%

frequency
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION

Distinct1226
Distinct (%)21.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.5464380088
Minimum0.005449591281
Maximum17
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size44.5 KiB
2022-08-30T11:27:56.349718image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0.005449591281
5-th percentile0.01102941176
Q10.02491103203
median1
Q31
95-th percentile1
Maximum17
Range16.99455041
Interquartile range (IQR)0.975088968

Descriptive statistics

Standard deviation0.5504756876
Coefficient of variation (CV)1.007389088
Kurtosis139.3159401
Mean0.5464380088
Median Absolute Deviation (MAD)0
Skewness4.869368496
Sum3103.76789
Variance0.3030234826
MonotonicityNot monotonic
2022-08-30T11:27:56.647668image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
12866
50.5%
247
 
0.8%
0.0277777777817
 
0.3%
0.062517
 
0.3%
0.0238095238116
 
0.3%
0.0833333333315
 
0.3%
0.0909090909115
 
0.3%
0.0294117647114
 
0.2%
0.0344827586214
 
0.2%
0.0212765957413
 
0.2%
Other values (1216)2646
46.6%
ValueCountFrequency (%)
0.0054495912811
 
< 0.1%
0.0054644808741
 
< 0.1%
0.0054794520551
 
< 0.1%
0.0054945054951
 
< 0.1%
0.0055865921792
< 0.1%
0.0056022408961
 
< 0.1%
0.0056179775282
< 0.1%
0.005665722381
 
< 0.1%
0.0056818181822
< 0.1%
0.0056980056983
0.1%
ValueCountFrequency (%)
171
 
< 0.1%
41
 
< 0.1%
35
 
0.1%
247
 
0.8%
1.1428571431
 
< 0.1%
12866
50.5%
0.751
 
< 0.1%
0.66666666673
 
0.1%
0.5508021391
 
< 0.1%
0.53351206431
 
< 0.1%

returns_no
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
SKEWED
ZEROS

Distinct205
Distinct (%)3.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean17.44542254
Minimum0
Maximum9014
Zeros4191
Zeros (%)73.8%
Negative0
Negative (%)0.0%
Memory size44.5 KiB
2022-08-30T11:27:57.034135image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q31
95-th percentile37.05
Maximum9014
Range9014
Interquartile range (IQR)1

Descriptive statistics

Standard deviation203.6577215
Coefficient of variation (CV)11.67399191
Kurtosis1171.105899
Mean17.44542254
Median Absolute Deviation (MAD)0
Skewness30.93860988
Sum99090
Variance41476.46753
MonotonicityNot monotonic
2022-08-30T11:27:57.404527image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
04191
73.8%
1169
 
3.0%
2148
 
2.6%
3105
 
1.8%
489
 
1.6%
678
 
1.4%
561
 
1.1%
1251
 
0.9%
744
 
0.8%
843
 
0.8%
Other values (195)701
 
12.3%
ValueCountFrequency (%)
04191
73.8%
1169
 
3.0%
2148
 
2.6%
3105
 
1.8%
489
 
1.6%
561
 
1.1%
678
 
1.4%
744
 
0.8%
843
 
0.8%
941
 
0.7%
ValueCountFrequency (%)
90141
< 0.1%
80041
< 0.1%
44271
< 0.1%
37681
< 0.1%
33321
< 0.1%
28781
< 0.1%
20221
< 0.1%
20121
< 0.1%
17761
< 0.1%
15941
< 0.1%

satisfaction_rate
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION

Distinct1377
Distinct (%)24.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.9897925099
Minimum0.01369863014
Maximum1
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size44.5 KiB
2022-08-30T11:27:57.761938image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0.01369863014
5-th percentile0.9607100675
Q10.9990186393
median1
Q31
95-th percentile1
Maximum1
Range0.9863013699
Interquartile range (IQR)0.0009813606672

Descriptive statistics

Standard deviation0.04849027029
Coefficient of variation (CV)0.04899033868
Kurtosis104.431706
Mean0.9897925099
Median Absolute Deviation (MAD)0
Skewness-9.067051101
Sum5622.021456
Variance0.002351306312
MonotonicityNot monotonic
2022-08-30T11:27:58.150173image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
14191
73.8%
0.99033816434
 
0.1%
0.98924731184
 
0.1%
0.97560975613
 
0.1%
0.9878048783
 
0.1%
0.97619047623
 
0.1%
0.99074074073
 
0.1%
0.95890410963
 
0.1%
0.99769053123
 
0.1%
0.98639455783
 
0.1%
Other values (1367)1460
 
25.7%
ValueCountFrequency (%)
0.013698630141
< 0.1%
0.16666666671
< 0.1%
0.36666666671
< 0.1%
0.38848920861
< 0.1%
0.39911634761
< 0.1%
0.40354330711
< 0.1%
0.43511450381
< 0.1%
0.43536121671
< 0.1%
0.43979591841
< 0.1%
0.46009389671
< 0.1%
ValueCountFrequency (%)
14191
73.8%
0.99988303641
 
< 0.1%
0.99981600741
 
< 0.1%
0.99971830991
 
< 0.1%
0.99968592961
 
< 0.1%
0.99963807461
 
< 0.1%
0.99963675991
 
< 0.1%
0.99963623141
 
< 0.1%
0.99963289281
 
< 0.1%
0.99960691821
 
< 0.1%

Interactions

2022-08-30T11:27:47.018148image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:19.995576image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:22.826418image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:25.446222image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:28.438110image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:31.484674image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:35.023716image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:37.972889image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:41.035535image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:44.058930image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:47.302263image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:20.387168image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:23.079872image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:25.762501image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:28.715333image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:31.959852image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:35.309774image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:38.283941image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:41.287510image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:44.347876image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:47.595924image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:20.655611image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:23.341636image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:26.066517image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:29.023226image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:32.303678image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:35.581580image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:38.585310image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:41.754512image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:44.642392image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:47.896656image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:20.921102image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:23.578645image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:26.349774image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:29.339569image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:32.624860image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:35.881986image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:38.909272image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:42.004354image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:44.909741image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:48.190686image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:21.147834image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:23.863756image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:26.659148image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:29.647435image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:33.052720image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:36.202721image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:39.217462image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:42.252218image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:45.225976image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:48.477670image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:21.418598image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:24.127732image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:26.923774image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:29.970108image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:33.414916image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:36.496080image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:39.511026image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:42.551956image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:45.499133image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:48.776876image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:21.634925image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:24.346001image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:27.230546image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:30.258256image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:33.748302image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:36.754272image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:39.804713image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:42.863679image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:45.819380image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:49.095858image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:21.881373image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:24.635361image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:27.541788image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:30.575010image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:34.096758image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:37.069234image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:40.114045image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:43.185080image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:46.109222image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:49.404114image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:22.138075image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:24.897161image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:27.859954image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:30.879456image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:34.455627image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:37.357344image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:40.427533image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:43.461685image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:46.362309image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:49.672100image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:22.553732image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:25.167138image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:28.135606image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:31.183775image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:34.742179image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:37.660160image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:40.733894image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:43.757906image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-08-30T11:27:46.646891image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Correlations

2022-08-30T11:27:58.619534image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-08-30T11:27:58.952931image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-08-30T11:27:59.276084image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-08-30T11:27:59.620037image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-08-30T11:27:50.095222image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
A simple visualization of nullity by column.
2022-08-30T11:27:50.523453image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

df_indexcustomer_idgross_revenuerecency_dayspurchases_noproducts_noitems_nofrequencyreturns_nosatisfaction_rate
00178505391.21372.034.0297.01733.017.00000040.00.976919
11130473232.5956.09.0171.01390.00.02830235.00.974820
22125836705.382.015.0232.05028.00.04032350.00.990056
3313748948.2595.05.028.0439.00.0179210.01.000000
4415100876.00333.03.03.080.00.07317122.00.725000
55152914623.3025.014.0102.02102.00.04011529.00.986204
66146885630.877.021.0327.03621.00.057221399.00.889809
77178095411.9116.012.061.02057.00.03352041.00.980068
881531160767.900.091.02379.038194.00.243316474.00.987590
99160982005.6387.07.067.0613.00.0243900.01.000000

Last rows

df_indexcustomer_idgross_revenuerecency_dayspurchases_noproducts_noitems_nofrequencyreturns_nosatisfaction_rate
56705761227004839.421.01.062.01074.01.00.01.0
5671576213298360.001.01.02.096.01.00.01.0
5672576314569227.391.01.012.079.01.00.01.0
567357642270417.901.01.07.014.01.00.01.0
56745765227053.351.01.02.02.01.00.01.0
56755766227065699.001.01.0634.01747.01.00.01.0
56765767227076756.060.01.0730.02010.01.00.01.0
56775768227083217.200.01.059.0654.01.00.01.0
56785769227093950.720.01.0217.0731.01.00.01.0
5679577012713794.550.01.037.0505.01.00.01.0